Instant Translation Model Adaptation by Translating Unseen Words in Continuous Vector Space

نویسندگان

  • Shonosuke Ishiwatari
  • Naoki Yoshinaga
  • Masashi Toyoda
  • Masaru Kitsuregawa
چکیده

In statistical machine translation (SMT), differences between domains of training and test data result in poor translations. Although there have been many studies on domain adaptation of language models and translation models, most require supervised in-domain language resources such as parallel corpora for training and tuning the models. The necessity of supervised data has made such methods difficult to adapt to practical SMT systems. We thus propose a novel method that adapts translation models without in-domain parallel corpora. Our method infers translation candidates of unseen words by nearest-neighbor search after projecting their vector-based semantic representations to the semantic space of the target language. In our experiment of out-of-domain translation from Japanese to English, our method improved BLEU score by 0.5-1.5.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EFL Translation Students' Perspective toward Using Bilingual Dictionary in Translation of Polysemous Words

This research presented the use of bilingual dictionary and addressed the EFL translation students' points of view on the use of bilingual dictionary in translating polysemous words (English to Persian). Moreo- ver, it aimed at finding the possible relationship between the effect of using bilingual dictionary by stu- dents in translating polysemous words and their achieved scores. In the study ...

متن کامل

s-Topological vector spaces

In this paper, we have dened and studied a generalized form of topological vector spaces called s-topological vector spaces. s-topological vector spaces are dened by using semi-open sets and semi-continuity in the sense of Levine. Along with other results, it is proved that every s-topological vector space is generalized homogeneous space. Every open subspace of an s-topological vector space is...

متن کامل

Context-dependent word representation for neural machine translation

We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine t...

متن کامل

Norms of Translating Taboo Words and Concepts from English into Persian after the Islamic Revolution in Iran

The research attempted to discover the norms of translating taboo words and concepts after the Islamic Revolution in Iran using Toury’s (1995) framework for classification of norms. The corpus of the study composed of Coelho’s novels between 1990 and 2005 and their Persian translations which were prepared and analyzed manually to discover the norms. During both the selection of novels for trans...

متن کامل

Machine translation in continuous space

We present a different perspective on the machine translation problem that relies upon continuous-space probabilistic models for words and phrases. Within this perspective we propose a method called Tied-Mixture Machine Translation (TMMT) that uses a trainable parametric model employing Gaussian mixture probability density functions to represent wordand phrase– pairs. In the new perspective, ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016